Policy learning in continuous-time Markov decision processes using Gaussian Processes

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Continuous time Markov decision processes

In this paper, we consider denumerable state continuous time Markov decision processes with (possibly unbounded) transition and cost rates under average criterion. We present a set of conditions and prove the existence of both average cost optimal stationary policies and a solution of the average optimality equation under the conditions. The results in this paper are applied to an admission con...

متن کامل

Solving Structured Continuous-Time Markov Decision Processes

We present an approach to solving structured continuous-time Markov decision processes. We approximate the the optimal value function by a compact linear form, resulting in a linear program. The main difficulty arises from the number of constraints that grow exponentially with the number of variables in the system. We exploit the representation of continuous-time Bayesian networks (CTBNs) to de...

متن کامل

Learning Qualitative Markov Decision Processes Learning Qualitative Markov Decision Processes

To navigate in natural environments, a robot must decide the best action to take according to its current situation and goal, a problem that can be represented as a Markov Decision Process (MDP). In general, it is assumed that a reasonable state representation and transition model can be provided by the user to the system. When dealing with complex domains, however, it is not always easy or pos...

متن کامل

Time-Bounded Reachability in Continuous-Time Markov Decision Processes

This paper solves the problem of computing the maximum and minimum probability to reach a set of goal states within a given time bound for locally uniform continuous-time Markov decision processes (CTMDPs). As this model allows for nondeterministic choices between exponentially delayed transitions, we define total time positional (TTP) schedulers which rely on the CTMDP’s current state and the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Performance Evaluation

سال: 2017

ISSN: 0166-5316

DOI: 10.1016/j.peva.2017.08.007